System and method for caching DRAM using an egress buffer

ABSTRACT

A system and method includes a server that includes a processor and a memory system coupled that are coupled to a bus system. A network interface is coupled to the processor and an egress buffer is coupled to the processor and the network interface by an egress bus.

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority from U.S. Provisional PatentApplication No. 60/345,315 filed on Oct. 22, 2001 and entitled “HighPerformance Web Server,” which is incorporated herein by reference inits entirety for all purposes.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates generally to microprocessors, andmore particularly, to methods and systems for microprocessors to servedata from memory systems.

[0004] 2. Description of the Related Art

[0005] A typical server computer, such as a web server, has one mainmemory. A server serves data from the memory to a client computer thatrequested the data. FIG. 1 shows a typical web server 102 and clientcomputer 110 that are linked by a network 104, such as the Internet orother network. FIG. 2 is a high-level block diagram of a typical webserver 102. As shown, the web server 102 includes a processor 202, amemory system 203 that includes a ROM 204, a main memory DRAM 206 and amass storage device 210, each connected by a peripheral bus system 208.The peripheral bus system 208 may include one or more buses connected toeach other through various bridges, controllers and/or adapters, such asare well known in the art. For example, the peripheral bus system 208may include a “system bus” that is connected through an adapter to oneor more expansion buses, such as a Peripheral Component Interconnect(PCI) bus. Also coupled to the peripheral bus system 208 are a networkinterface 212, a number (N) of input/output (I/O) devices 216-1 through216-N and a peripheral cryptographic processor 220. 141 I/O devices216-1 through 216-N may include, for example, a keyboard, a pointingdevice, a display device and/or other conventional I/O devices. Massstorage device 210 may include any suitable device for storing largevolumes of data, such as a magnetic disk or tape, magneto-optical (MO)storage device, or any of various types of Digital Versatile Disk (DVD)or Compact Disk (CD) based storage.

[0006] Network interface 212 provides data communication between thecomputer system and other computer systems on the network 104. Hence,network interface 212 may be any device suitable for or enabling the webserver 102 to communicate data with a remote processing system (e.g.,client computer 110) over a data communication link, such as aconventional telephone modem, an Integrated Services Digital Network(ISDN) adapter, a Digital Subscriber Line (DSL) adapter, a cable modem,a satellite transceiver, an Ethernet adapter, or the like. 161 The webserver 102 typically processes large quantities of data, for example,streaming data such as streaming video or streaming audio or other typesof data or serving a website and other web data. FIG. 3 is a flowchartof the method operations 300 of the web server 102 serving a largevolume of data such as a 10 MB data stream. In operation 305, the webserver 102 receives a request for the 10 MB data stream from the client110. If the 10 MB data stream required processing such as beingencrypted, then the 10 MB data stream must first be retrieved from theDRAM 206 into the processor 202. In operation 310, the data stream isretrieved from the DRAM 206 and/or other portions of the memory system203.

[0007] In operation 315, the data stream is processed in the processor202. In operation 320, the processed data stream is stored in the DRAM206. In operation 325, the data stream is served through the networkinterface 212 to the network 104 to the client 110.

[0008] The processed data stream must be stored in the memory system 203because the processor 202 and the network interface 212 typically havedifferent data processing rates. By way of example, the processor 202can process data at a rate of about 2 GHz or even greater. Theperipheral bus system 208 typically operates at about 166 MHz, thereforethe network interface 212 typically does not operates as fast as 2 GHzand cannot serve the data as fast as the processor can process the data.As a result the processed data must be temporarily stored in the memorysystem 203 so that the network interface 212 can serve the processeddata at the optimal rate for the network interface 212. Alternatively,the network interface 212 may be able to output data faster than theprocessor can process the data, therefore, the processed data can bebuilt up in the memory system 203 and the network interface 203 canserve the data from the memory system 203 at a high rate.

[0009] Now, as described in FIG. 3 above, the 10 MB data stream musttransfer across the peripheral bus system 208 between the DRAM 206 andthe processor 202 three times. Therefore, a 10 MB data stream beingserved results in a 30 MB data stream flowing between the DRAM 206 andthe processor 202. These multiple passes between the DRAM 206 and theprocessor 202 consume large portion of the total I/O bandwidth of theprocessor 202 I/O which can limit the ability of the processor 202 toperform other operations besides serving the 10 MB data stream.

[0010] What is needed is a system and method to reduce the bandwidthusage of the processor to memory system interface.

SUMMARY OF THE INVENTION

[0011] Broadly speaking, the present invention fills these needs byproviding a system method for caching DRAM to reduce the bandwidth usageof the processor to memory system interface. It should be appreciatedthat the present invention can be implemented in numerous ways,including as a process, an apparatus, a system, computer readable media,or a device. Several inventive embodiments of the present invention aredescribed below.

[0012] One embodiment includes a server that includes a processor and amemory system coupled that are coupled to a bus system. A networkinterface is coupled to the processor and an egress buffer is coupled tothe processor and the network interface by an egress bus.

[0013] The processor can also include multiple processors. The multipleprocessors can be included on a first die or chip. Alternatively, themultiple processors can be included on multiple separate dies or chips.

[0014] The egress buffer can include a high-speed random access memory.In one embodiment, the egress buffer includes random access memory thathas an operating speed of about 400 MHz.

[0015] The egress buffer and the egress bus can have a data throughputrate that is greater than or equal to about twice the amount of a datastream to be served.

[0016] The egress buffer can also include a double data rate buffer.

[0017] The egress buffer can also include a double data rate buffer.

[0018] The egress bus has a bandwidth that is greater than or equal toabout twice the amount of a data stream to be served. The egress bus canalso include a 32-bit data bus.

[0019] One embodiment includes a system and method of serving data thatincludes receiving a request for data in a processor in a server. Therequested data is retrieved. The retrieved data is processed in theprocessor. The processed data is stored in an egress buffer that iscoupled to the processor and a network interface. The stored data isserved from the egress buffer through the network interface.

[0020] The egress buffer is coupled to the processor and the networkinterface by an egress bus.

[0021] The requested data can include a data stream.

[0022] The egress bus has a bandwidth of about twice a bandwidth of thedata stream.

[0023] The egress bus can include a 32-bit data bus.

[0024] The processed data can be stored in the egress buffersubstantially simultaneously with the stored data being served from theegress buffer.

[0025] Processing the retrieved data in the processor can also includeformatting the data, encrypting the data, and decrypting the data amongother processes.

[0026] Another embodiment includes a system and method of serving a datastream that includes receiving a request for a data stream in aprocessor in a server. The requested data stream is retrieved. Theretrieved data stream is processed in the processor. The processed datastream is stored in an egress buffer that is coupled to the processorand a network interface by an egress bus. The egress bus has a bandwidththat is greater than or equal to about twice the data stream. The storeddata stream is served from the egress buffer through the networkinterface. The data stream can include audio or video or any otherstreaming media.

[0027] Other aspects and advantages of the invention will becomeapparent from the following detailed description, taken in conjunctionwith the accompanying drawings, illustrating by way of example theprinciples of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0028] The present invention will be readily understood by the followingdetailed description in conjunction with the accompanying drawings, andlike reference numerals designate like structural elements.

[0029]FIG. 1 shows a typical web server and client computer that arelinked by a network, such as the Internet or other network.

[0030]FIG. 2 is a high-level block diagram of a typical web server.

[0031]FIG. 3 is a flowchart of the method operations of the web serverserving a large volume of data such as a 10 MB data stream.

[0032]FIG. 4 shows a block diagram of a server in accordance with oneembodiment of the present invention.

[0033]FIG. 5 is a flow chart of the method operations of serving datausing an egress buffer in accordance with one embodiment of the presentinvention.

[0034]FIG. 6 shows a block diagram of a processor according to oneembodiment of the present invention.

DETAILED DESCRIPTION OF THE EXEMPLARY EMBODIMENTS

[0035] Several exemplary embodiments for caching DRAM to reduce thebandwidth usage of the processor to memory system interface will now bedescribed. It will be apparent to those skilled in the art that thepresent invention may be practiced without some or all of the specificdetails set forth herein.

[0036] One embodiment of the present invention includes an egress bufferthat can be used to temporarily store processed data from the processorthat will be served by the network interface. The egress buffer therebyreduces the demand on the bandwidth usage of the processor to memorysystem interface by about two-thirds.

[0037]FIG. 4 shows a block diagram of a server 400 in accordance withone embodiment of the present invention. The server 400 can be a webserver or other type of server. The server 400 includes a bus system 408that couples a processor 402 and a memory system 404. The processor 402includes at least one processor core 402A. The server 400 also includesan egress buffer 420 that is coupled to the processor 402 and a networkinterface 412.

[0038] The egress buffer 420 is coupled to the processor 402 and thenetwork interface 412 via a dedicated egress bus 422. The egress bus 422can be as wide as necessary, for example, the egress bus 422 can be32-bits (i.e., lines) wide but the egress bus 422 could be narrower orwider such as 16-bits or 64-bits. The egress buffer 420 can be largeenough to buffer the desired data throughput of the network interface412 as will be described in more detail below. Referring to the aboveexample of a 10 gigabit data throughput, the egress buffer 420 wouldneed to be 32 megabytes or possibly larger.

[0039] In one embodiment, the egress buffer 420 includes a veryhigh-speed ram such as a fast cycle time RAM (FCRAM) that operates asfast as about 400 MHz or more. The FCRAM allows the egress buffer 420 toserve the data across the egress bus 422 to the network interface 412 atthe speed of the network interface 412.

[0040] In one embodiment, the server 400 can include multiple processorson multiple processor chips or dies. The egress bus 422 can also couplea single egress buffer 420 to all of the multiple processors. An egressbus controller can be included to manage the data flow between themultiple processors and the egress buffer 420.

[0041]FIG. 5 is a flow chart of the method operations 500 of servingdata using an egress buffer in accordance with one embodiment of thepresent invention. In operation 505, a request for data is received inthe server 400. The request can be from an application within the server400 or due to a request received from an external data requester, suchas a client computer 110 in FIG. 1 that is linked to the server 400 by anetwork.

[0042] The processor 402 retrieves the requested data, in operation 510.The data can be retrieved from numerous sources such as from the memorysystem 404 or other sources via the system data bus 408. In operation515, the processor 402 processes the retrieved data such as packetizingthe data or performing some other formatting, encryption, decryption, orother processing to the retrieved data.

[0043] The processed data is stored in the egress buffer 420 via theegress bus 422, in operation 520. In operation 525, the networkinterface 412, 412′ retrieves the processed data from the egress buffer420, via the egress bus 422 and serves the data to the data requestor.

[0044]FIG. 6 shows a block diagram of a processor 402′ according to oneembodiment of the present invention. The processor 402′ includes aprocessor core 402A′ and an integrated network interface 412′. Becausethe integrated network interface 412′ is included on the processor die402′ with the processor core 402A′, the network interface 412′ canoutput data faster than the network interface 412 described in FIG. 4above.

[0045] In one embodiment, a dedicated bus 422A couples the processorcore 402A′ to the egress bus 422, through a process data switch 430. Theprocess data switch 430 is also coupled to the network interface 412′via a bus 422B. Alternatively, the network interface 412′ can be coupledto the egress buffer 420 by a separate, dedicated bus. The process dataswitch 430 directs the data from the processor core 402A′ to the egressbuffer 420 or the memory system 404 and controls the data flow acrossthe egress bus 422 so that the data flows either to the networkinterface 412′ or from the processor core 402A′.

[0046] In alternative embodiments, the egress bus 422 can also linkother components on the processor die to the egress buffer 420.

[0047] In one embodiment, the egress buffer 420 is an about 400 MHz,double data rate (DDR) buffer. When combined with a 32-bit wide egressbus 422, a 400 MHz DDR buffer produces 800 MHz×32-bit wide egress bus422 to produce 3.2 GB per second throughput with a relatively smallactual buffer of only two or four bits per 32 bit lines of the egressbus 422. 3.2 GB per second throughput of the egress bus 422 and egressbuffer 420 equates to slightly more than 24 gigabits per second. A 24gigabit per second egress buffer 420 can support two10-gigabit-per-second data streams: A first 10 gigabit data stream isinput to the egress buffer 420 while a second 10 gigabit data streamoutput from the egress buffer 420 to the network interface 412, 412′.The speed of the egress buffer 420 memory must be sufficient to supportthe network interface 412, 412′ data demand rate.

[0048] The egress buffer 420

[0049] Because the egress buffer 420 is coupled to the processor core402A′ by the dedicated egress bus 422, the egress bus 422 can deliverthe data much quicker than a shared data bus such as the I/O interface432 between the memory system 404 and the processor core 402A′. Further,because the egress buffer 420 uses much higher speed type RAM (e.g.,FCRAM), the egress buffer 420 can serve the data faster than standardDRAM.

[0050] The egress buffer 420 can also substantially smooth out the datainterface between the data processing rate of the processor core 402Aand the rate the network interface 412 can serve the data. Often thedifference in processing rates (i.e., the transient variation) can varyas the processor performs other operations or the network is busy andreduces the rate the network interface 412 can serve the data. Theamount of transient variation increases as the size of the egress buffer420 increases.

[0051] The egress buffer 420 FCRAM can operate in any range from about100 MHz or even slower to about 400 MHz or greater. The higher speed ofthe egress buffer 420, the greater the efficiency of the processorserving the data to the network interface. Alternatively, lower speedegress buffer 420 FCRAM would also increase the efficiency by reducingthe demand across the system bus 408 and specifically across theinterface between the memory system 404 and the processor 402.

[0052] The egress buffer 420 could be within a single die or chip withthe processor 402. However, typically the egress buffer 420 would not bepart of the processor die because of the physical size of the memory isrelatively large as compared to the size of the microprocessor devicesin the processor 402 and therefore including the egress buffer is not anefficient use of the space on processor die.

[0053] The network interface 412, 412′ can have any bandwidth such asabout a 4 gigabit per second or about a 10 gigabit per second. Thenetwork interface 412, 412′ has direct access to the egress buffer 420via the dedicated egress bus 422.

[0054] As used herein the term “about” means +/−10%. By way of example,the phrase “about 250” indicates a range of between 225 and 275.

[0055] With the above embodiments in mind, it should be understood thatthe invention might employ various computer-implemented operationsinvolving data stored in computer systems. These operations are thoserequiring physical manipulation of physical quantities. Usually, thoughnot necessarily, these quantities take the form of electrical ormagnetic signals capable of being stored, transferred, combined,compared, and otherwise manipulated. Further, the manipulationsperformed are often referred to in terms, such as producing,identifying, determining, or comparing.

[0056] Any of the operations described herein that form part of theinvention are useful machine operations. The invention also relates to adevice or an apparatus for performing these operations. The apparatusmay be specially constructed for the required purposes, or it may be ageneral-purpose computer selectively activated or configured by acomputer program stored in the computer. In particular, variousgeneral-purpose machines may be used with computer programs written inaccordance with the teachings herein, or it may be more convenient toconstruct a more specialized apparatus to perform the requiredoperations.

[0057] The invention can also be embodied as computer readable code on acomputer readable medium. The computer readable medium is any datastorage device that can store data, which can be thereafter be read by acomputer system. Examples of the computer readable medium include harddrives, network attached storage (NAS), read-only memory, random-accessmemory, CD-ROMs, CD-Rs, CD-RWs, magnetic tapes, and other optical andnon-optical data storage devices. The computer readable medium can alsobe distributed over a network coupled computer systems so that thecomputer readable code is stored and executed in a distributed fashion.

[0058] It will be further appreciated that the instructions representedby the operations in FIG. 5 are not required to be performed in theorder illustrated, and that all the processing represented by theoperations may not be necessary to practice the invention. Further, theprocesses described in FIG. 5 can also be implemented in software storedin any one of or combinations of the RAM, the ROM, or the hard diskdrive.

[0059] Although the foregoing invention has been described in somedetail for purposes of clarity of understanding, it will be apparentthat certain changes and modifications may be practiced within the scopeof the appended claims. Accordingly, the present embodiments are to beconsidered as illustrative and not restrictive, and the invention is notto be limited to the details given herein, but may be modified withinthe scope and equivalents of the appended claims.

What is claimed is:
 1. A server comprising: a processor coupled to a bussystem; a memory system coupled to the bus system; a network interfacecoupled to the processor; and an egress buffer coupled to the processorand the network interface by an egress bus.
 2. The server of claim 1,wherein the processor includes a plurality of processors.
 3. The serverof claim 2, wherein the plurality of processors are included on a firstdie.
 4. The server of claim 2, wherein the plurality of processors areincluded on a plurality of dies.
 5. The server of claim 1, wherein theegress buffer includes high speed random access memory.
 6. The server ofclaim 1, wherein the egress buffer includes random access memory thathas an operating speed of about 400 MHz.
 7. The server of claim 1,wherein the egress buffer and the egress bus have a data throughput ratethat is greater than or equal to about twice the amount of a data streamto be served.
 8. The server of claim 1, wherein the egress bufferincludes a double data rate buffer.
 9. The server of claim 1, whereinthe egress bus has a bandwidth that is greater than or equal to abouttwice the amount of a data stream to be served.
 10. The server of claim1, wherein the egress bus includes a 32-bit data bus.
 11. A method ofserving data comprising: receiving a request for data in a processor ina server; retrieving the requested data; processing the retrieved datain the processor; storing the processed data in an egress buffer that iscoupled to the processor and a network interface; and serving the storeddata from the egress buffer through the network interface.
 12. Themethod of claim 11, wherein the egress buffer that is coupled to theprocessor and the network interface by an egress bus.
 13. The method ofclaim 11, wherein the requested data includes a data stream.
 14. Themethod of claim 13, wherein the egress bus has a bandwidth of abouttwice a bandwidth of the data stream.
 15. The method of claim 13,wherein the egress bus includes a 32-bit data bus.
 16. The method ofclaim 11, wherein the processed data is stored in the egress buffersubstantially simultaneously with the stored data being served from theegress buffer.
 17. The method of claim 11, wherein processing theretrieved data in the processor includes at least one of a groupconsisting of formatting the data, encrypting the data, and decryptingthe data.
 18. A method of serving a data stream comprising: receiving arequest for a data stream in a processor in a server; retrieving therequested data stream; processing the retrieved data stream in theprocessor; storing the processed data stream in an egress buffer that iscoupled to the processor and a network interface by an egress bus havinga bandwidth that is greater than or equal to about twice the datastream; and serving the stored data stream from the egress bufferthrough the network interface.
 19. The method of claim 18, wherein thedata stream includes at least one of a group consisting of audio andvideo.